ANOVA Assisted Variable Selection in High-dimensional Multicategory Response Data
نویسندگان
چکیده
Multinomial logistic regression is preferred in the classification of multicategory response data for its ease interpretation and ability to identify associated input variables each category. However, identifying important high-dimensional poses several challenges as majority are unnecessary discriminating categories. Frequently used techniques include regularisation such Least Absolute Selection Shrinkage Operator (LASSO) sure independent screening (SIS) or combinations both. In this paper, we propose use ANOVA, assist SIS variable when multicategorical. The new approach straightforward computationally effective. Simulated without with correlation generated numerical studies illustrate methodology, results applying methods on real presented. conclusion, ANOVA performance comparable selection uncorrelated performs better combination both correlated variables.
منابع مشابه
High-Dimensional Variable Selection for Survival Data
The minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree. We derive the distribution of the minimal depth and use it for high-dimensional variable selection using random survival forests. In big p and small n problems (where p is the dimension and n is the sample size), the distribution of the minimal depth reveals a...
متن کاملHigh Dimensional Variable Selection.
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis tes...
متن کاملBayesian Variable Selection in Clustering High-Dimensional Data With Substructure
In this article we focus on clustering techniques recently proposed for highdimensional data that incorporate variable selection and extend them to the modeling of data with a known substructure, such as the structure imposed by an experimental design. Our method essentially approximates the within-group covariance by facilitating clustering without disrupting the groups defined by the experime...
متن کاملBayesian Variable Selection in Clustering High-Dimensional Data
Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates (p n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneousl...
متن کاملVariable Selection and Prediction with Incomplete High-dimensional Data.
We propose a Multiple Imputation Random Lasso (mirl) method to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. In this study 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after listwise deletion substantially reduces prediction power. Recent work o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics and Statistics
سال: 2023
ISSN: ['2332-2144', '2332-2071']
DOI: https://doi.org/10.13189/ms.2023.110110